Dataset statistics
| Number of variables | 10 |
|---|---|
| Number of observations | 5695 |
| Missing cells | 0 |
| Missing cells (%) | 0.0% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 467.2 KiB |
| Average record size in memory | 84.0 B |
Variable types
| Numeric | 10 |
|---|
gross_revenue is highly correlated with invoiceno and 3 other fields | High correlation |
recencydays is highly correlated with invoiceno | High correlation |
invoiceno is highly correlated with gross_revenue and 4 other fields | High correlation |
quantity is highly correlated with gross_revenue and 3 other fields | High correlation |
frequency is highly correlated with invoiceno | High correlation |
qtd_return is highly correlated with invoiceno | High correlation |
avg_basket_size is highly correlated with gross_revenue and 2 other fields | High correlation |
avg_unique_basket_size is highly correlated with gross_revenue and 2 other fields | High correlation |
gross_revenue is highly correlated with invoiceno and 1 other fields | High correlation |
invoiceno is highly correlated with gross_revenue and 1 other fields | High correlation |
quantity is highly correlated with gross_revenue and 1 other fields | High correlation |
avg_ticket is highly correlated with qtd_return | High correlation |
qtd_return is highly correlated with avg_ticket | High correlation |
avg_basket_size is highly correlated with avg_unique_basket_size | High correlation |
avg_unique_basket_size is highly correlated with avg_basket_size | High correlation |
gross_revenue is highly correlated with invoiceno and 2 other fields | High correlation |
invoiceno is highly correlated with gross_revenue and 1 other fields | High correlation |
quantity is highly correlated with gross_revenue and 1 other fields | High correlation |
frequency is highly correlated with invoiceno | High correlation |
avg_basket_size is highly correlated with gross_revenue and 1 other fields | High correlation |
customerid is highly correlated with recencydays | High correlation |
gross_revenue is highly correlated with invoiceno and 3 other fields | High correlation |
recencydays is highly correlated with customerid | High correlation |
invoiceno is highly correlated with gross_revenue and 1 other fields | High correlation |
quantity is highly correlated with gross_revenue and 1 other fields | High correlation |
avg_ticket is highly correlated with gross_revenue and 1 other fields | High correlation |
qtd_return is highly correlated with gross_revenue and 1 other fields | High correlation |
avg_basket_size is highly correlated with avg_unique_basket_size | High correlation |
avg_unique_basket_size is highly correlated with avg_basket_size | High correlation |
gross_revenue is highly skewed (γ1 = 21.33147068) | Skewed |
avg_ticket is highly skewed (γ1 = 53.30430577) | Skewed |
qtd_return is highly skewed (γ1 = 51.46514451) | Skewed |
customerid has unique values | Unique |
qtd_return has 4143 (72.7%) zeros | Zeros |
Reproduction
| Analysis started | 2022-09-20 10:01:24.384375 |
|---|---|
| Analysis finished | 2022-09-20 10:03:02.019253 |
| Duration | 1 minute and 37.63 seconds |
| Software version | pandas-profiling v3.1.0 |
| Download configuration | config.json |
| Distinct | 5695 |
|---|---|
| Distinct (%) | 100.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 31232.13942 |
| Minimum | 12346 |
|---|---|
| Maximum | 83709 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 66.7 KiB |
Quantile statistics
| Minimum | 12346 |
|---|---|
| 5-th percentile | 12699.1 |
| Q1 | 14288.5 |
| median | 16229 |
| Q3 | 18210.5 |
| 95-th percentile | 82731.1 |
| Maximum | 83709 |
| Range | 71363 |
| Interquartile range (IQR) | 3922 |
Descriptive statistics
| Standard deviation | 28408.38395 |
|---|---|
| Coefficient of variation (CV) | 0.909588151 |
| Kurtosis | -0.5185359982 |
| Mean | 31232.13942 |
| Median Absolute Deviation (MAD) | 1962 |
| Skewness | 1.210180524 |
| Sum | 177867034 |
| Variance | 807036278.5 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 17850 | 1 | < 0.1% |
| 16344 | 1 | < 0.1% |
| 12922 | 1 | < 0.1% |
| 82097 | 1 | < 0.1% |
| 16589 | 1 | < 0.1% |
| 13730 | 1 | < 0.1% |
| 16866 | 1 | < 0.1% |
| 82095 | 1 | < 0.1% |
| 82094 | 1 | < 0.1% |
| 82093 | 1 | < 0.1% |
| Other values (5685) | 5685 |
| Value | Count | Frequency (%) |
| 12346 | 1 | |
| 12347 | 1 | |
| 12348 | 1 | |
| 12349 | 1 | |
| 12350 | 1 | |
| 12352 | 1 | |
| 12353 | 1 | |
| 12354 | 1 | |
| 12355 | 1 | |
| 12356 | 1 |
| Value | Count | Frequency (%) |
| 83709 | 1 | |
| 83708 | 1 | |
| 83707 | 1 | |
| 83706 | 1 | |
| 83705 | 1 | |
| 83704 | 1 | |
| 83700 | 1 | |
| 83699 | 1 | |
| 83696 | 1 | |
| 83695 | 1 |
gross_revenue
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONSKEWED| Distinct | 5461 |
|---|---|
| Distinct (%) | 95.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1862.766266 |
| Minimum | 0.42 |
|---|---|
| Maximum | 280206.02 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 0.42 |
|---|---|
| 5-th percentile | 13.635 |
| Q1 | 244.605 |
| median | 639.89 |
| Q3 | 1653.865 |
| 95-th percentile | 5522.757 |
| Maximum | 280206.02 |
| Range | 280205.6 |
| Interquartile range (IQR) | 1409.26 |
Descriptive statistics
| Standard deviation | 7963.897229 |
|---|---|
| Coefficient of variation (CV) | 4.275306771 |
| Kurtosis | 594.1773042 |
| Mean | 1862.766266 |
| Median Absolute Deviation (MAD) | 501.75 |
| Skewness | 21.33147068 |
| Sum | 10608453.88 |
| Variance | 63423659.07 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7.95 | 9 | 0.2% |
| 1.25 | 8 | 0.1% |
| 4.95 | 8 | 0.1% |
| 2.95 | 8 | 0.1% |
| 12.75 | 7 | 0.1% |
| 3.75 | 7 | 0.1% |
| 1.65 | 7 | 0.1% |
| 7.5 | 6 | 0.1% |
| 5.95 | 6 | 0.1% |
| 4.25 | 6 | 0.1% |
| Other values (5451) | 5623 |
| Value | Count | Frequency (%) |
| 0.42 | 1 | < 0.1% |
| 0.65 | 1 | < 0.1% |
| 0.79 | 1 | < 0.1% |
| 0.84 | 3 | 0.1% |
| 0.85 | 3 | 0.1% |
| 1.07 | 1 | < 0.1% |
| 1.25 | 8 | |
| 1.44 | 1 | < 0.1% |
| 1.65 | 7 | |
| 1.69 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 280206.02 | 1 | |
| 259657.3 | 1 | |
| 194550.79 | 1 | |
| 168472.5 | 1 | |
| 143825.06 | 1 | |
| 124914.53 | 1 | |
| 117379.63 | 1 | |
| 91062.38 | 1 | |
| 81024.84 | 1 | |
| 77183.6 | 1 |
| Distinct | 304 |
|---|---|
| Distinct (%) | 5.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 116.7311677 |
| Minimum | 0 |
|---|---|
| Maximum | 373 |
| Zeros | 38 |
| Zeros (%) | 0.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 3 |
| Q1 | 22.5 |
| median | 71 |
| Q3 | 199 |
| 95-th percentile | 337.3 |
| Maximum | 373 |
| Range | 373 |
| Interquartile range (IQR) | 176.5 |
Descriptive statistics
| Standard deviation | 111.5236412 |
|---|---|
| Coefficient of variation (CV) | 0.955388723 |
| Kurtosis | -0.6387979375 |
| Mean | 116.7311677 |
| Median Absolute Deviation (MAD) | 61 |
| Skewness | 0.8160444818 |
| Sum | 664784 |
| Variance | 12437.52255 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 110 | 1.9% |
| 4 | 105 | 1.8% |
| 3 | 98 | 1.7% |
| 2 | 92 | 1.6% |
| 10 | 86 | 1.5% |
| 8 | 82 | 1.4% |
| 9 | 80 | 1.4% |
| 17 | 79 | 1.4% |
| 7 | 78 | 1.4% |
| 22 | 65 | 1.1% |
| Other values (294) | 4820 |
| Value | Count | Frequency (%) |
| 0 | 38 | 0.7% |
| 1 | 110 | |
| 2 | 92 | |
| 3 | 98 | |
| 4 | 105 | |
| 5 | 52 | |
| 7 | 78 | |
| 8 | 82 | |
| 9 | 80 | |
| 10 | 86 |
| Value | Count | Frequency (%) |
| 373 | 23 | |
| 372 | 22 | |
| 371 | 17 | |
| 369 | 4 | 0.1% |
| 368 | 13 | |
| 367 | 16 | |
| 366 | 15 | |
| 365 | 19 | |
| 364 | 11 | |
| 362 | 7 | 0.1% |
| Distinct | 59 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 3.49165935 |
| Minimum | 1 |
|---|---|
| Maximum | 210 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 1 |
| median | 1 |
| Q3 | 4 |
| 95-th percentile | 11.3 |
| Maximum | 210 |
| Range | 209 |
| Interquartile range (IQR) | 3 |
Descriptive statistics
| Standard deviation | 6.868852155 |
|---|---|
| Coefficient of variation (CV) | 1.96721715 |
| Kurtosis | 308.5962454 |
| Mean | 3.49165935 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 13.32902721 |
| Sum | 19885 |
| Variance | 47.18112993 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 2855 | |
| 2 | 831 | 14.6% |
| 3 | 508 | 8.9% |
| 4 | 386 | 6.8% |
| 5 | 243 | 4.3% |
| 6 | 172 | 3.0% |
| 7 | 143 | 2.5% |
| 8 | 98 | 1.7% |
| 9 | 68 | 1.2% |
| 10 | 54 | 0.9% |
| Other values (49) | 337 | 5.9% |
| Value | Count | Frequency (%) |
| 1 | 2855 | |
| 2 | 831 | 14.6% |
| 3 | 508 | 8.9% |
| 4 | 386 | 6.8% |
| 5 | 243 | 4.3% |
| 6 | 172 | 3.0% |
| 7 | 143 | 2.5% |
| 8 | 98 | 1.7% |
| 9 | 68 | 1.2% |
| 10 | 54 | 0.9% |
| Value | Count | Frequency (%) |
| 210 | 1 | |
| 201 | 1 | |
| 124 | 1 | |
| 97 | 1 | |
| 93 | 1 | |
| 91 | 1 | |
| 86 | 1 | |
| 74 | 1 | |
| 63 | 1 | |
| 62 | 1 |
| Distinct | 57 |
|---|---|
| Distinct (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 9.479543459 |
| Minimum | 1 |
|---|---|
| Maximum | 102 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 1 |
|---|---|
| 5-th percentile | 1 |
| Q1 | 5 |
| median | 9 |
| Q3 | 13 |
| 95-th percentile | 20 |
| Maximum | 102 |
| Range | 101 |
| Interquartile range (IQR) | 8 |
Descriptive statistics
| Standard deviation | 6.638745978 |
|---|---|
| Coefficient of variation (CV) | 0.7003233865 |
| Kurtosis | 18.41590315 |
| Mean | 9.479543459 |
| Median Absolute Deviation (MAD) | 4 |
| Skewness | 2.545936511 |
| Sum | 53986 |
| Variance | 44.07294816 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 7 | 443 | 7.8% |
| 9 | 424 | 7.4% |
| 1 | 420 | 7.4% |
| 10 | 403 | 7.1% |
| 6 | 391 | 6.9% |
| 8 | 386 | 6.8% |
| 5 | 360 | 6.3% |
| 11 | 344 | 6.0% |
| 4 | 294 | 5.2% |
| 12 | 284 | 5.0% |
| Other values (47) | 1946 |
| Value | Count | Frequency (%) |
| 1 | 420 | |
| 2 | 250 | |
| 3 | 248 | |
| 4 | 294 | |
| 5 | 360 | |
| 6 | 391 | |
| 7 | 443 | |
| 8 | 386 | |
| 9 | 424 | |
| 10 | 403 |
| Value | Count | Frequency (%) |
| 102 | 1 | |
| 82 | 1 | |
| 79 | 1 | |
| 74 | 1 | |
| 69 | 1 | |
| 60 | 1 | |
| 59 | 1 | |
| 58 | 1 | |
| 56 | 1 | |
| 54 | 1 |
| Distinct | 5507 |
|---|---|
| Distinct (%) | 96.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 54.61760545 |
| Minimum | 0.42 |
|---|---|
| Maximum | 77183.6 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 0.42 |
|---|---|
| 5-th percentile | 3.465 |
| Q1 | 8.510502308 |
| median | 16.12 |
| Q3 | 22.57508013 |
| 95-th percentile | 76.32 |
| Maximum | 77183.6 |
| Range | 77183.18 |
| Interquartile range (IQR) | 14.06457782 |
Descriptive statistics
| Standard deviation | 1281.098642 |
|---|---|
| Coefficient of variation (CV) | 23.45578191 |
| Kurtosis | 2955.510013 |
| Mean | 54.61760545 |
| Median Absolute Deviation (MAD) | 7.217 |
| Skewness | 53.30430577 |
| Sum | 311047.263 |
| Variance | 1641213.73 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 3.75 | 11 | 0.2% |
| 4.95 | 10 | 0.2% |
| 1.25 | 9 | 0.2% |
| 2.95 | 9 | 0.2% |
| 7.95 | 8 | 0.1% |
| 8.25 | 7 | 0.1% |
| 12.75 | 7 | 0.1% |
| 1.65 | 7 | 0.1% |
| 4.15 | 6 | 0.1% |
| 3.35 | 6 | 0.1% |
| Other values (5497) | 5615 |
| Value | Count | Frequency (%) |
| 0.42 | 2 | |
| 0.535 | 1 | < 0.1% |
| 0.65 | 1 | < 0.1% |
| 0.79 | 1 | < 0.1% |
| 0.8371428571 | 1 | < 0.1% |
| 0.84 | 2 | |
| 0.85 | 3 | |
| 1.002222222 | 1 | < 0.1% |
| 1.02 | 1 | < 0.1% |
| 1.03875 | 1 | < 0.1% |
| Value | Count | Frequency (%) |
| 77183.6 | 1 | |
| 56157.5 | 1 | |
| 13305.5 | 1 | |
| 4453.43 | 1 | |
| 3861 | 1 | |
| 3096 | 1 | |
| 2027.86 | 1 | |
| 1687.2 | 1 | |
| 1377.077778 | 1 | |
| 1001.2 | 1 |
| Distinct | 1225 |
|---|---|
| Distinct (%) | 21.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.5475706259 |
| Minimum | 0.005449591281 |
|---|---|
| Maximum | 17 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 0.005449591281 |
|---|---|
| 5-th percentile | 0.01102941176 |
| Q1 | 0.02492211838 |
| median | 1 |
| Q3 | 1 |
| 95-th percentile | 1 |
| Maximum | 17 |
| Range | 16.99455041 |
| Interquartile range (IQR) | 0.9750778816 |
Descriptive statistics
| Standard deviation | 0.5505967909 |
|---|---|
| Coefficient of variation (CV) | 1.005526529 |
| Kurtosis | 138.7856997 |
| Mean | 0.5475706259 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 4.851371477 |
| Sum | 3118.414715 |
| Variance | 0.3031568261 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 2879 | |
| 2 | 48 | 0.8% |
| 0.0625 | 18 | 0.3% |
| 0.02777777778 | 17 | 0.3% |
| 0.02380952381 | 16 | 0.3% |
| 0.09090909091 | 15 | 0.3% |
| 0.08333333333 | 15 | 0.3% |
| 0.02941176471 | 14 | 0.2% |
| 0.03448275862 | 14 | 0.2% |
| 0.07692307692 | 13 | 0.2% |
| Other values (1215) | 2646 |
| Value | Count | Frequency (%) |
| 0.005449591281 | 1 | < 0.1% |
| 0.005464480874 | 1 | < 0.1% |
| 0.005479452055 | 1 | < 0.1% |
| 0.005494505495 | 1 | < 0.1% |
| 0.005586592179 | 2 | |
| 0.005602240896 | 1 | < 0.1% |
| 0.005617977528 | 2 | |
| 0.00566572238 | 1 | < 0.1% |
| 0.005681818182 | 2 | |
| 0.005698005698 | 3 |
| Value | Count | Frequency (%) |
| 17 | 1 | < 0.1% |
| 4 | 1 | < 0.1% |
| 3 | 5 | 0.1% |
| 2 | 48 | 0.8% |
| 1.142857143 | 1 | < 0.1% |
| 1 | 2879 | |
| 0.75 | 1 | < 0.1% |
| 0.6666666667 | 3 | 0.1% |
| 0.550802139 | 1 | < 0.1% |
| 0.5335120643 | 1 | < 0.1% |
| Distinct | 219 |
|---|---|
| Distinct (%) | 3.8% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 48.01720808 |
| Minimum | 0 |
|---|---|
| Maximum | 80995 |
| Zeros | 4143 |
| Zeros (%) | 72.7% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 1 |
| 95-th percentile | 40 |
| Maximum | 80995 |
| Range | 80995 |
| Interquartile range (IQR) | 1 |
Descriptive statistics
| Standard deviation | 1475.325676 |
|---|---|
| Coefficient of variation (CV) | 30.72493664 |
| Kurtosis | 2713.85495 |
| Mean | 48.01720808 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 51.46514451 |
| Sum | 273458 |
| Variance | 2176585.85 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 0 | 4143 | |
| 1 | 190 | 3.3% |
| 2 | 156 | 2.7% |
| 3 | 107 | 1.9% |
| 4 | 90 | 1.6% |
| 6 | 72 | 1.3% |
| 5 | 64 | 1.1% |
| 12 | 49 | 0.9% |
| 8 | 49 | 0.9% |
| 7 | 48 | 0.8% |
| Other values (209) | 727 | 12.8% |
| Value | Count | Frequency (%) |
| 0 | 4143 | |
| 1 | 190 | 3.3% |
| 2 | 156 | 2.7% |
| 3 | 107 | 1.9% |
| 4 | 90 | 1.6% |
| 5 | 64 | 1.1% |
| 6 | 72 | 1.3% |
| 7 | 48 | 0.8% |
| 8 | 49 | 0.9% |
| 9 | 38 | 0.7% |
| Value | Count | Frequency (%) |
| 80995 | 1 | |
| 74215 | 1 | |
| 9361 | 1 | |
| 9014 | 1 | |
| 8060 | 1 | |
| 4627 | 1 | |
| 3768 | 1 | |
| 3335 | 1 | |
| 2975 | 1 | |
| 2160 | 1 |
avg_basket_size
Real number (ℝ≥0)
HIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATIONHIGH CORRELATION| Distinct | 2375 |
|---|---|
| Distinct (%) | 41.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.04377848893 |
| Minimum | 1.347436502 × 10-5 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 1.347436502 × 10-5 |
|---|---|
| 5-th percentile | 0.001363210568 |
| Q1 | 0.003448275862 |
| median | 0.006622516556 |
| Q3 | 0.0133742249 |
| 95-th percentile | 0.25 |
| Maximum | 1 |
| Range | 0.9999865256 |
| Interquartile range (IQR) | 0.009925949037 |
Descriptive statistics
| Standard deviation | 0.1516964889 |
|---|---|
| Coefficient of variation (CV) | 3.465091934 |
| Kurtosis | 28.77128418 |
| Mean | 0.04377848893 |
| Median Absolute Deviation (MAD) | 0.003846281132 |
| Skewness | 5.278378743 |
| Sum | 249.3184945 |
| Variance | 0.02301182474 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 110 | 1.9% |
| 0.5 | 71 | 1.2% |
| 0.3333333333 | 53 | 0.9% |
| 0.25 | 50 | 0.9% |
| 0.2 | 36 | 0.6% |
| 0.1666666667 | 28 | 0.5% |
| 0.08333333333 | 26 | 0.5% |
| 0.01 | 21 | 0.4% |
| 0.01369863014 | 21 | 0.4% |
| 0.009433962264 | 20 | 0.4% |
| Other values (2365) | 5259 |
| Value | Count | Frequency (%) |
| 1.347436502 × 10-5 | 1 | |
| 2.469227255 × 10-5 | 1 | |
| 7.067637289 × 10-5 | 1 | |
| 7.165376899 × 10-5 | 1 | |
| 0.0001278118609 | 1 | |
| 0.0001664078101 | 1 | |
| 0.0001676727029 | 1 | |
| 0.0001923816853 | 1 | |
| 0.0002325581395 | 1 | |
| 0.0002336448598 | 1 |
| Value | Count | Frequency (%) |
| 1 | 110 | |
| 0.6666666667 | 1 | < 0.1% |
| 0.5 | 71 | |
| 0.3333333333 | 53 | |
| 0.3 | 1 | < 0.1% |
| 0.25 | 50 | |
| 0.2 | 36 | 0.6% |
| 0.1875 | 1 | < 0.1% |
| 0.1764705882 | 1 | < 0.1% |
| 0.1666666667 | 28 | 0.5% |
| Distinct | 1282 |
|---|---|
| Distinct (%) | 22.5% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.1326961552 |
| Minimum | 0.0008976660682 |
|---|---|
| Maximum | 1 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Negative | 0 |
| Negative (%) | 0.0% |
| Memory size | 89.0 KiB |
Quantile statistics
| Minimum | 0.0008976660682 |
|---|---|
| 5-th percentile | 0.005704545455 |
| Q1 | 0.02777777778 |
| median | 0.05555555556 |
| Q3 | 0.1111111111 |
| 95-th percentile | 0.6383116883 |
| Maximum | 1 |
| Range | 0.9991023339 |
| Interquartile range (IQR) | 0.08333333333 |
Descriptive statistics
| Standard deviation | 0.2213233687 |
|---|---|
| Coefficient of variation (CV) | 1.667895866 |
| Kurtosis | 8.64729026 |
| Mean | 0.1326961552 |
| Median Absolute Deviation (MAD) | 0.03514739229 |
| Skewness | 3.021774538 |
| Sum | 755.7046039 |
| Variance | 0.04898403353 |
| Monotonicity | Not monotonic |
Histogram with fixed size bins (bins=50)
| Value | Count | Frequency (%) |
| 1 | 271 | 4.8% |
| 0.5 | 162 | 2.8% |
| 0.3333333333 | 115 | 2.0% |
| 0.07692307692 | 103 | 1.8% |
| 0.1 | 97 | 1.7% |
| 0.1111111111 | 96 | 1.7% |
| 0.25 | 96 | 1.7% |
| 0.2 | 95 | 1.7% |
| 0.07142857143 | 95 | 1.7% |
| 0.09090909091 | 94 | 1.7% |
| Other values (1272) | 4471 |
| Value | Count | Frequency (%) |
| 0.0008976660682 | 1 | |
| 0.001335113485 | 1 | |
| 0.001367989056 | 1 | |
| 0.001386962552 | 1 | |
| 0.001418439716 | 1 | |
| 0.001455604076 | 1 | |
| 0.001479289941 | 1 | |
| 0.001481481481 | 1 | |
| 0.001510574018 | 1 | |
| 0.001533742331 | 1 |
| Value | Count | Frequency (%) |
| 1 | 271 | |
| 0.8333333333 | 1 | < 0.1% |
| 0.8 | 1 | < 0.1% |
| 0.75 | 2 | < 0.1% |
| 0.6666666667 | 9 | 0.2% |
| 0.6428571429 | 1 | < 0.1% |
| 0.6363636364 | 1 | < 0.1% |
| 0.6 | 4 | 0.1% |
| 0.5454545455 | 1 | < 0.1% |
| 0.5263157895 | 1 | < 0.1% |
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here. A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
First rows
| customerid | gross_revenue | recencydays | invoiceno | quantity | avg_ticket | frequency | qtd_return | avg_basket_size | avg_unique_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 17850 | 5391.21 | 372 | 34 | 6 | 18.152222 | 17.000000 | 40.0 | 0.019619 | 0.114478 |
| 1 | 13047 | 3237.54 | 31 | 10 | 12 | 18.822907 | 0.028302 | 36.0 | 0.007189 | 0.058140 |
| 2 | 12583 | 7281.38 | 2 | 15 | 25 | 29.479271 | 0.040323 | 51.0 | 0.002964 | 0.060729 |
| 3 | 13748 | 948.25 | 95 | 5 | 8 | 33.866071 | 0.017921 | 0.0 | 0.011390 | 0.178571 |
| 4 | 15100 | 876.00 | 333 | 3 | 2 | 292.000000 | 0.073171 | 22.0 | 0.037500 | 1.000000 |
| 5 | 15291 | 4668.30 | 25 | 15 | 17 | 45.323301 | 0.040115 | 29.0 | 0.007133 | 0.145631 |
| 6 | 14688 | 5630.87 | 7 | 21 | 24 | 17.219786 | 0.057221 | 399.0 | 0.005800 | 0.064220 |
| 7 | 17809 | 5411.91 | 16 | 12 | 23 | 88.719836 | 0.033520 | 42.0 | 0.005834 | 0.196721 |
| 8 | 15311 | 60767.90 | 0 | 91 | 43 | 25.543464 | 0.243316 | 474.0 | 0.002383 | 0.038251 |
| 9 | 16098 | 2005.63 | 87 | 7 | 15 | 29.934776 | 0.024390 | 0.0 | 0.011419 | 0.104478 |
Last rows
| customerid | gross_revenue | recencydays | invoiceno | quantity | avg_ticket | frequency | qtd_return | avg_basket_size | avg_unique_basket_size | |
|---|---|---|---|---|---|---|---|---|---|---|
| 5685 | 83700 | 4839.42 | 1 | 1 | 25 | 78.055161 | 1.0 | 0.0 | 0.000931 | 0.016129 |
| 5686 | 13298 | 360.00 | 1 | 1 | 2 | 180.000000 | 1.0 | 0.0 | 0.010417 | 0.500000 |
| 5687 | 14569 | 227.39 | 1 | 1 | 5 | 18.949167 | 1.0 | 0.0 | 0.012658 | 0.083333 |
| 5688 | 83704 | 17.90 | 1 | 1 | 1 | 2.557143 | 1.0 | 0.0 | 0.071429 | 0.142857 |
| 5689 | 83705 | 3.35 | 1 | 1 | 1 | 1.675000 | 1.0 | 0.0 | 0.500000 | 0.500000 |
| 5690 | 83706 | 6637.59 | 1 | 1 | 23 | 10.452898 | 1.0 | 0.0 | 0.000572 | 0.001575 |
| 5691 | 83707 | 7689.23 | 0 | 1 | 22 | 10.518782 | 1.0 | 0.0 | 0.000497 | 0.001368 |
| 5692 | 83708 | 3217.20 | 0 | 1 | 20 | 54.528814 | 1.0 | 0.0 | 0.001529 | 0.016949 |
| 5693 | 83709 | 5664.89 | 0 | 1 | 16 | 25.985734 | 1.0 | 0.0 | 0.001366 | 0.004587 |
| 5694 | 12713 | 848.55 | 0 | 1 | 9 | 22.330263 | 1.0 | 0.0 | 0.001969 | 0.026316 |